HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference
arxiv.org·14h
A Novel Side-channel Attack That Utilizes Memory Re-orderings (U. of Washington, Duke, UCSC et al.)
semiengineering.com·38m
Why AI Needs GPUs and TPUs: The Hardware Behind LLMs
blog.bytebytego.com·2d
Weird RAM issue
68kmla.org·12h
Bringing Data Transformations Near-Memory for Low-Latency Analytics in HTAP Environments
arxiv.org·14h
Co-optimization Approaches For Reliable and Efficient AI Acceleration (Peking University et al.)
semiengineering.com·1h
Conversation: LLMs and the what/how loop
martinfowler.com·4h
Loading...Loading more...